Skip to content

Add ParallelAsync for concurrent branch execution (DOTNET-8662)#2375

Open
GarrettBeatty wants to merge 5 commits into
devfrom
gcbeatty/durable-parallel
Open

Add ParallelAsync for concurrent branch execution (DOTNET-8662)#2375
GarrettBeatty wants to merge 5 commits into
devfrom
gcbeatty/durable-parallel

Conversation

@GarrettBeatty

@GarrettBeatty GarrettBeatty commented May 14, 2026

Copy link
Copy Markdown
Contributor

#2216

What

Adds parallel branch execution to Amazon.Lambda.DurableExecution. ParallelAsync runs N branches concurrently with configurable concurrency limits and completion policies, returning an IBatchResult<T> with per-branch status and error information. The shared IBatchResult<T> family is reused by MapAsync in Wave 2.

Public API:

Type Purpose
IDurableContext.ParallelAsync<T>(Func[], ...) Run unnamed branches concurrently.
IDurableContext.ParallelAsync<T>(DurableBranch<T>[], ...) Same, but each branch carries an explicit name for traces / test inspection.
DurableBranch<T>(Name, Func) Named-branch record.
ParallelConfig MaxConcurrency, CompletionConfig, NestingType.
CompletionConfig When the batch is considered complete. Factories: AllSuccessful(), FirstSuccessful(), AllCompleted(). Validated MinSuccessful / ToleratedFailureCount / ToleratedFailurePercentage (0.0–1.0).
IBatchResult<T> Per-branch view: All / Succeeded / Failed / Started, GetResults, GetErrors, ThrowIfError, HasFailure, CompletionReason, count properties.
IBatchItem<T> Single-branch record (Index, Name, Status, Result, Error).
BatchItemStatus Succeeded / Failed / Started.
CompletionReason AllCompleted / MinSuccessfulReached / FailureToleranceExceeded.
NestingType Nested (default); Flat reserved for a follow-up PR (throws NotSupportedException today).
ParallelException Thrown when CompletionConfig signals FailureToleranceExceeded; carries the IBatchResult<T>.

Per-branch checkpoint payloads are serialized via the ILambdaSerializer registered on ILambdaContext.Serializer — same pattern as StepAsync / RunInChildContextAsync from #2370. There are no separate reflection / AOT-safe overload pairs: the AOT story is determined entirely by which serializer the user registers with the runtime (e.g., SourceGeneratorLambdaJsonSerializer<TContext>).

Testing

31 new unit tests in ParallelOperationTests.cs and supporting fixtures:

  • CompletionConfig matrix: AllSuccessful, AllCompleted, FirstSuccessful, MinSuccessful, ToleratedFailureCount, ToleratedFailurePercentage — both pass and fail thresholds.
  • Concurrency: MaxConcurrency enforced via semaphore; unbounded when null; cancel-mid-dispatch leaves no orphan branches.
  • Concurrent ExecutionState access regression test (parallel writers do not corrupt the visited-set).
  • Replay determinism: mixed-status replay (SUCCEEDED + FAILED + STARTED), FirstSuccessful with all-fail, named vs. unnamed branches.
  • IBatchResult<T> accessors and GetResults / GetErrors / ThrowIfError semantics.
  • NestingType.Flat throws NotSupportedException (placeholder for follow-up).

6 new integration tests build successfully (require AWS credentials to run): happy path, max-concurrency, first-successful, partial-failure, failure-tolerance, and replay-determinism.


COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]

COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]

COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]

COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]

COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-parallel branch from 19c0128 to fa13eef Compare May 14, 2026 21:49
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-wave0 branch from 464c591 to d308c3b Compare May 14, 2026 21:49
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-parallel branch from fa13eef to b7a06b4 Compare May 14, 2026 22:19
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-wave0 branch from d308c3b to be4c3ad Compare May 18, 2026 15:23
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-parallel branch from b7a06b4 to 08b2095 Compare May 18, 2026 15:44
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-wave0 branch 3 times, most recently from ad4d208 to 3acbed5 Compare May 20, 2026 17:46
Base automatically changed from gcbeatty/durable-wave0 to gcbeatty/durable-child-context May 20, 2026 17:46
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-child-context branch 2 times, most recently from 4d97473 to 8a6c41c Compare May 21, 2026 18:56
Base automatically changed from gcbeatty/durable-child-context to feature/durablefunction May 23, 2026 15:58
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-parallel branch 3 times, most recently from e4da00c to 9485866 Compare June 1, 2026 16:40
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-parallel branch from 9485866 to cff0b86 Compare June 5, 2026 15:59
@GarrettBeatty GarrettBeatty requested a review from Copilot June 5, 2026 16:33

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new durable “fan-out” capability to Amazon.Lambda.DurableExecution via IDurableContext.ParallelAsync, enabling concurrent branch execution with configurable completion policies and concurrency limits, while extending replay/state handling to support multi-threaded branch orchestration.

Changes:

  • Introduces ParallelAsync public API (+ supporting types like ParallelConfig, CompletionConfig, IBatchResult<T>, DurableBranch<T>, and ParallelException).
  • Implements Internal/ParallelOperation<T> orchestration and makes ExecutionState thread-safe for concurrent branch execution/replay.
  • Adds extensive unit tests plus multiple end-to-end integration test functions for determinism, throttling, and failure policy behavior.

Reviewed changes

Copilot reviewed 45 out of 45 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
Libraries/test/Amazon.Lambda.DurableExecution.Tests/ParallelOperationTests.cs New unit tests covering ParallelAsync behavior (completion modes, determinism, cancellation, replay).
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelReplayDeterminismFunction/ParallelReplayDeterminismFunction.csproj New integration test function project for replay determinism.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelReplayDeterminismFunction/Function.cs Workflow that validates replay determinism across suspend/resume under parallel branches.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelReplayDeterminismFunction/Dockerfile Container packaging for the replay determinism test function.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelPartialFailureFunction/ParallelPartialFailureFunction.csproj New integration test function project for partial failure behavior.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelPartialFailureFunction/Function.cs Workflow demonstrating AllCompleted with per-branch errors surfaced in the result.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelPartialFailureFunction/Dockerfile Container packaging for the partial failure test function.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelMaxConcurrencyFunction/ParallelMaxConcurrencyFunction.csproj New integration test function project for max concurrency throttling.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelMaxConcurrencyFunction/Function.cs Workflow that uses durable waits to validate max-concurrency throttling.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelMaxConcurrencyFunction/Dockerfile Container packaging for the max concurrency test function.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelHappyPathFunction/ParallelHappyPathFunction.csproj New integration test function project for the basic happy path.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelHappyPathFunction/Function.cs Happy-path parallel workflow returning joined results.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelHappyPathFunction/Dockerfile Container packaging for the happy path test function.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelFirstSuccessfulFunction/ParallelFirstSuccessfulFunction.csproj New integration test function project for FirstSuccessful behavior.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelFirstSuccessfulFunction/Function.cs Workflow exercising FirstSuccessful with staggered durable waits.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelFirstSuccessfulFunction/Dockerfile Container packaging for the FirstSuccessful test function.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelFailureToleranceFunction/ParallelFailureToleranceFunction.csproj New integration test function project for failure tolerance exceeded.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelFailureToleranceFunction/Function.cs Workflow that triggers ParallelException via tolerated failure count.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelFailureToleranceFunction/Dockerfile Container packaging for the failure tolerance test function.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/ParallelReplayDeterminismTest.cs End-to-end test asserting deterministic branch IDs and replayed step outputs.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/ParallelPartialFailureTest.cs End-to-end test asserting AllCompleted surfaces partial failures without failing workflow.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/ParallelMaxConcurrencyTest.cs End-to-end test asserting max concurrency throttling via timestamp clustering.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/ParallelHappyPathTest.cs End-to-end happy-path test asserting correct service-side context history.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/ParallelFirstSuccessfulTest.cs End-to-end test asserting short-circuit semantics and winner reporting.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/ParallelFailureToleranceTest.cs End-to-end test asserting workflow fails with ParallelException on exceeded tolerance.
Libraries/src/Amazon.Lambda.DurableExecution/ParallelConfig.cs New configuration type for parallel execution (concurrency, completion, nesting).
Libraries/src/Amazon.Lambda.DurableExecution/Operation.cs Adds wire subtypes for Parallel parent and ParallelBranch checkpoints.
Libraries/src/Amazon.Lambda.DurableExecution/NestingType.cs New enum describing checkpoint graph representation for parallel/map branches.
Libraries/src/Amazon.Lambda.DurableExecution/Internal/ParallelSummary.cs Internal checkpoint payload schema for parallel parent summary.
Libraries/src/Amazon.Lambda.DurableExecution/Internal/ParallelOperation.cs Core implementation of durable parallel orchestration + replay reconstruction.
Libraries/src/Amazon.Lambda.DurableExecution/Internal/ParallelJsonContext.cs Source-generated STJ context for AOT-safe serialization of internal summary payload.
Libraries/src/Amazon.Lambda.DurableExecution/Internal/ExecutionState.cs Makes execution state thread-safe via a single lock for concurrent branch access.
Libraries/src/Amazon.Lambda.DurableExecution/Internal/BatchResult.cs Default internal implementation of IBatchResult<T> views and helpers.
Libraries/src/Amazon.Lambda.DurableExecution/Internal/BatchItem.cs Default internal implementation of IBatchItem<T>.
Libraries/src/Amazon.Lambda.DurableExecution/IDurableContext.cs Public interface additions for ParallelAsync overloads.
Libraries/src/Amazon.Lambda.DurableExecution/IBatchResult.cs New public batch result interfaces used by parallel/map.
Libraries/src/Amazon.Lambda.DurableExecution/IBatchItem.cs New public per-branch/item interface surfaced by batch results.
Libraries/src/Amazon.Lambda.DurableExecution/DurableExecutionException.cs Adds ParallelException carrying completion reason and batch result.
Libraries/src/Amazon.Lambda.DurableExecution/DurableContext.cs Implements ParallelAsync and refactors child-context factory creation.
Libraries/src/Amazon.Lambda.DurableExecution/DurableBranch.cs New public record representing a named parallel branch.
Libraries/src/Amazon.Lambda.DurableExecution/CompletionReason.cs New enum describing why a batch resolved.
Libraries/src/Amazon.Lambda.DurableExecution/CompletionConfig.cs New completion policy type with factories (AllSuccessful/AllCompleted/FirstSuccessful).
Libraries/src/Amazon.Lambda.DurableExecution/BatchItemStatus.cs New enum describing per-branch outcome status (Succeeded/Failed/Started).
Docs/durable-execution-design.md Documentation updates reflecting new parallel types and failure-percentage config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Libraries/src/Amazon.Lambda.DurableExecution/CompletionConfig.cs Outdated
Comment thread Libraries/src/Amazon.Lambda.DurableExecution/CompletionConfig.cs
Comment thread Libraries/src/Amazon.Lambda.DurableExecution/BatchItemStatus.cs
Comment thread Libraries/src/Amazon.Lambda.DurableExecution/IBatchResult.cs
@GarrettBeatty GarrettBeatty marked this pull request as ready for review June 5, 2026 17:00
@GarrettBeatty GarrettBeatty requested review from a team as code owners June 5, 2026 17:00
@GarrettBeatty GarrettBeatty requested review from normj and philasmar and removed request for a team June 5, 2026 17:00
/// </summary>
/// <remarks>
/// Not yet implemented in the .NET SDK; passing this value throws
/// <see cref="System.NotSupportedException"/>.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do in another pr

@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-parallel branch from 1a1d5bc to 69364a0 Compare June 8, 2026 15:44
@philasmar

Copy link
Copy Markdown
Collaborator

Important

  1. ExecutionState single-lock may bottleneck for high branch counts

The switch from ConcurrentDictionary to a single lock is well-reasoned (compound operations in LoadFromCheckpoint need atomicity). Each critical section is O(1). For the branch
counts in this PR's tests (3-10), contention is negligible. However, the lock is on the read path (GetOperation, IsReplaying) that every branch hits during replay. If MapAsync
(mentioned as a follow-up) dispatches hundreds of items, this could become measurable. No action needed now — just flag for the MapAsync PR to benchmark.

  1. Dispatched branches are never cancelled on short-circuit

The design explicitly states: once dispatched, branches are never cancelled — even after a MinSuccessful or FailureToleranceExceeded short-circuit fires. This guarantees replay
determinism (all dispatched branches end in a terminal state), but it means with MaxConcurrency=null and FirstSuccessful, all branches execute to completion even though only
the first one's result matters. The parallel.md doc doesn't mention this tradeoff. Consider adding a note like: "Dispatched branches always run to completion for replay
determinism — use MaxConcurrency to limit wasted compute under FirstSuccessful."

  1. ParallelException.Result is type-erased (IBatchResult not IBatchResult)

public IBatchResult? Result { get; init; }

Since ParallelException isn't generic, the result is boxed behind the non-generic IBatchResult marker. Callers who catch ParallelException need to cast:
(IBatchResult)ex.Result. This is fine, but worth noting in the exception's XML doc that the cast is expected (it's mentioned on ParallelException but a snippet would
help users).

@GarrettBeatty

Copy link
Copy Markdown
Contributor Author

Important

1. ExecutionState single-lock may bottleneck for high branch counts

The switch from ConcurrentDictionary to a single lock is well-reasoned (compound operations in LoadFromCheckpoint need atomicity). Each critical section is O(1). For the branch counts in this PR's tests (3-10), contention is negligible. However, the lock is on the read path (GetOperation, IsReplaying) that every branch hits during replay. If MapAsync (mentioned as a follow-up) dispatches hundreds of items, this could become measurable. No action needed now — just flag for the MapAsync PR to benchmark.

2. Dispatched branches are never cancelled on short-circuit

The design explicitly states: once dispatched, branches are never cancelled — even after a MinSuccessful or FailureToleranceExceeded short-circuit fires. This guarantees replay determinism (all dispatched branches end in a terminal state), but it means with MaxConcurrency=null and FirstSuccessful, all branches execute to completion even though only the first one's result matters. The parallel.md doc doesn't mention this tradeoff. Consider adding a note like: "Dispatched branches always run to completion for replay determinism — use MaxConcurrency to limit wasted compute under FirstSuccessful."

3. ParallelException.Result is type-erased (IBatchResult not IBatchResult)

public IBatchResult? Result { get; init; }

Since ParallelException isn't generic, the result is boxed behind the non-generic IBatchResult marker. Callers who catch ParallelException need to cast: (IBatchResult)ex.Result. This is fine, but worth noting in the exception's XML doc that the cast is expected (it's mentioned on ParallelException but a snippet would help users).

fixed number 2 only, the rest are already documented 3179531

@@ -0,0 +1,15 @@
namespace Amazon.Lambda.DurableExecution.Internal;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add license header

@@ -0,0 +1,80 @@
namespace Amazon.Lambda.DurableExecution.Internal;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add license header

@@ -0,0 +1,15 @@
using System.Text.Json.Serialization;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add license header

@@ -0,0 +1,654 @@
using System.IO;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add license header

@@ -0,0 +1,31 @@
using System.Text.Json.Serialization;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add license header

@@ -0,0 +1,72 @@
using System.Linq;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add license header

@@ -0,0 +1,76 @@
using System.Linq;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add license header

@@ -0,0 +1,74 @@
using System.Linq;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add license header

@@ -0,0 +1,122 @@
using System.Linq;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add license header

@@ -0,0 +1,1163 @@
using Amazon.Lambda.DurableExecution;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add license header

/// <remarks>
/// <see cref="NestingType.Flat"/> is not yet supported in the .NET SDK and
/// will throw <see cref="System.NotSupportedException"/> when the parallel
/// operation is invoked.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be implemented in #2409

/// <see cref="CompletionReason.FailureToleranceExceeded"/> does the parallel
/// throw.
/// </remarks>
internal sealed class ParallelOperation<T> : DurableOperation<IBatchResult<T>>

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole class is going to get refactored later on in #2408 to support mapasync as well

{
try
{
await Task.WhenAll(inFlight).ConfigureAwait(false);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed on the side. We need a mechanism to communicate to the inflight branches that they should cleanly shut down. For example provide them their own cancellation token that we cancel from here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added in a shortCircuitCts which does this.

shortCircuitCts is per-parallel CancellationTokenSource that gets Cancel()ed the moment a CompletionConfig short-circuit is decided (at a dispatch-loop break or in onComplete), and its token flows only into each running branch's user Func

@GarrettBeatty GarrettBeatty Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i also wired in the workflowcancellation where applicable too

Comment thread Libraries/src/Amazon.Lambda.DurableExecution/Internal/ParallelOperation.cs Outdated
@normj normj mentioned this pull request Jun 10, 2026
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-parallel branch 2 times, most recently from c3162e0 to c88ad8a Compare June 11, 2026 17:54
Base automatically changed from feature/durablefunction to dev June 11, 2026 17:56
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-parallel branch from c88ad8a to d347067 Compare June 12, 2026 15:24
@GarrettBeatty GarrettBeatty requested a review from normj June 12, 2026 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants